在Proxmox VE 7.1 中开启vGPU | 您所在的位置:网站首页 › 1060 黑屏 › 在Proxmox VE 7.1 中开启vGPU |
一:了解NVIDIA vGPU 下图是Nvidia vGPU的原理。在宿主机上安装vGPU驱动,使用nvidia vGPU管理器控制vGPU,随后创建多个mdev设备,也就是vGPU,用于直通到虚拟机,虚拟机使用Nvidia 驱动用于驱动vGPU。有点类似gvt-g。不过这里最重要的是NVIDIA vGPU管理器 。 在宿主机上安装好nvidia vgpu驱动之后,会有2个服务。 nvidia-vgpud.service nvidia-vgpu-mgr.service 简单的解释下这2个服务在vgpu启动时的作用: 1、在使用vGPU卡的时候,正常逻辑是,开机之后,nvidia-vgpud 服务会查询内核中所有已安装的 GPU,并检查 vGPU 功能。如果找到支持 vGPU 的 GPU,则 nvidia-vgpu 会创建一个 MDEV 设备,系统会创建 /sys/class/mdev_bus 目录。 2、将这些设备分配给 VM,当 VM 启动时,它将打开 MDEV 设备。nvidia-vgpu-mgr 此时会使用 ioctl 与内核进行通信。当 nvidia-vgpu-mgr 询问 GPU 是否支持 vGPU 时,vgpu会回答是,随后尝试初始化 vGPU 设备。 目前vgpu_unlock项目只支持Time-sliced技术,也就是单GPU实例性能会动态分配。如一张P4,如果只有一个GPU实例,那么多获得接近100%的性能,同时2个GPU实例,会分别获得1/2的性能。 根据Nvidia vgpu限制,单GPU实例,最少1g显存。如P4 8G,最多有8个1G 显存的GPU实例同时运行 二:了解vgpu_unlock原理 正如我们上说vgpu的启动流程。当然我们使用消费卡的时候,nvidia-vgpud这个服务会检测卡的类型,如果是消费卡,自然不会创建mdev设备。如果使用vgpu_unlock,此脚本会拦截nvidia-vgpud的调用,然后欺骗它,这是一张vGPU卡,快产生mdev设备信息吧! 将mdev设备直通给虚拟机,启动的时候,vgpu_unlock又会拦截nvdia-vgpu-mgr服务,告诉它,GPU支持vGPU,快初始化设备吧! 三:vGPU_unlock支持的显卡 点击查看显卡列表 [21c4] TU116 [GeForce GTX 1660 SUPER] -> Quadro RTX 6000 [21d1] TU116BM [GeForce GTX 1660 Ti Mobile] -> Quadro RTX 6000 [21c2] TU116 -> Quadro RTX 6000 [2182] TU116 [GeForce GTX 1660 Ti] -> Quadro RTX 6000 [2183] TU116 -> Quadro RTX 6000 [2184] TU116 [GeForce GTX 1660] -> Quadro RTX 6000 [2187] TU116 [GeForce GTX 1650 SUPER] -> Quadro RTX 6000 [2188] TU116 [GeForce GTX 1650] -> Quadro RTX 6000 [2191] TU116M [GeForce GTX 1660 Ti Mobile] -> Quadro RTX 6000 [2192] TU116M [GeForce GTX 1650 Ti Mobile] -> Quadro RTX 6000 [21ae] TU116GL -> Quadro RTX 6000 [21bf] TU116GL -> Quadro RTX 6000 [2189] TU116 [CMP 30HX] -> Quadro RTX 6000 [1fbf] TU117GL -> Quadro RTX 6000 [1fbb] TU117GLM [Quadro T500 Mobile] -> Quadro RTX 6000 [1fd9] TU117BM [GeForce GTX 1650 Mobile Refresh] -> Quadro RTX 6000 [1ff9] TU117GLM [Quadro T1000 Mobile] -> Quadro RTX 6000 [1fdd] TU117BM [GeForce GTX 1650 Mobile Refresh] -> Quadro RTX 6000 [1f96] TU117M [GeForce GTX 1650 Mobile / Max-Q] -> Quadro RTX 6000 [1f99] TU117M -> Quadro RTX 6000 [1fae] TU117GL -> Quadro RTX 6000 [1fb8] TU117GLM [Quadro T2000 Mobile / Max-Q] -> Quadro RTX 6000 [1fb9] TU117GLM [Quadro T1000 Mobile] -> Quadro RTX 6000 [1f97] TU117M [GeForce MX450] -> Quadro RTX 6000 [1f98] TU117M [GeForce MX450] -> Quadro RTX 6000 [1f9c] TU117M [GeForce MX450] -> Quadro RTX 6000 [1f9d] TU117M [GeForce GTX 1650 Mobile / Max-Q] -> Quadro RTX 6000 [1fb0] TU117GLM [Quadro T1000 Mobile] -> Quadro RTX 6000 [1fb1] TU117GL [T600] -> Quadro RTX 6000 [1fb2] TU117GLM [Quadro T400 Mobile] -> Quadro RTX 6000 [1fba] TU117GLM [T600 Mobile] -> Quadro RTX 6000 [1f42] TU106 [GeForce RTX 2060 SUPER] -> Quadro RTX 6000 [1f47] TU106 [GeForce RTX 2060 SUPER] -> Quadro RTX 6000 [1f50] TU106BM [GeForce RTX 2070 Mobile / Max-Q] -> Quadro RTX 6000 [1f51] TU106BM [GeForce RTX 2060 Mobile] -> Quadro RTX 6000 [1f54] TU106BM [GeForce RTX 2070 Mobile] -> Quadro RTX 6000 [1f55] TU106BM [GeForce RTX 2060 Mobile] -> Quadro RTX 6000 [1f81] TU117 -> Quadro RTX 6000 [1f82] TU117 [GeForce GTX 1650] -> Quadro RTX 6000 [1f91] TU117M [GeForce GTX 1650 Mobile / Max-Q] -> Quadro RTX 6000 [1f92] TU117M [GeForce GTX 1650 Mobile] -> Quadro RTX 6000 [1f94] TU117M [GeForce GTX 1650 Mobile] -> Quadro RTX 6000 [1f95] TU117M [GeForce GTX 1650 Ti Mobile] -> Quadro RTX 6000 [1f76] TU106GLM [Quadro RTX 3000 Mobile Refresh] -> Quadro RTX 6000 [1f07] TU106 [GeForce RTX 2070 Rev. A] -> Quadro RTX 6000 [1f08] TU106 [GeForce RTX 2060 Rev. A] -> Quadro RTX 6000 [1f09] TU106 [GeForce GTX 1660 SUPER] -> Quadro RTX 6000 [1f0a] TU106 [GeForce GTX 1650] -> Quadro RTX 6000 [1f10] TU106M [GeForce RTX 2070 Mobile] -> Quadro RTX 6000 [1f11] TU106M [GeForce RTX 2060 Mobile] -> Quadro RTX 6000 [1f12] TU106M [GeForce RTX 2060 Max-Q] -> Quadro RTX 6000 [1f14] TU106M [GeForce RTX 2070 Mobile / Max-Q Refresh] -> Quadro RTX 6000 [1f15] TU106M [GeForce RTX 2060 Mobile] -> Quadro RTX 6000 [1f2e] TU106M -> Quadro RTX 6000 [1f36] TU106GLM [Quadro RTX 3000 Mobile / Max-Q] -> Quadro RTX 6000 [1f0b] TU106 [CMP 40HX] -> Quadro RTX 6000 [1eb5] TU104GLM [Quadro RTX 5000 Mobile / Max-Q] -> Quadro RTX 6000 [1eb6] TU104GLM [Quadro RTX 4000 Mobile / Max-Q] -> Quadro RTX 6000 [1eb8] TU104GL [Tesla T4] -> Quadro RTX 6000 [1eb9] TU104GL -> Quadro RTX 6000 [1ebe] TU104GL -> Quadro RTX 6000 [1ec2] TU104 [GeForce RTX 2070 SUPER] -> Quadro RTX 6000 [1ec7] TU104 [GeForce RTX 2070 SUPER] -> Quadro RTX 6000 [1ed0] TU104BM [GeForce RTX 2080 Mobile] -> Quadro RTX 6000 [1ed1] TU104BM [GeForce RTX 2070 SUPER Mobile / Max-Q] -> Quadro RTX 6000 [1ed3] TU104BM [GeForce RTX 2080 SUPER Mobile / Max-Q] -> Quadro RTX 6000 [1f02] TU106 [GeForce RTX 2070] -> Quadro RTX 6000 [1f04] TU106 -> Quadro RTX 6000 [1f06] TU106 [GeForce RTX 2060 SUPER] -> Quadro RTX 6000 [1ef5] TU104GLM [Quadro RTX 5000 Mobile Refresh] -> Quadro RTX 6000 [1e81] TU104 [GeForce RTX 2080 SUPER] -> Quadro RTX 6000 [1e82] TU104 [GeForce RTX 2080] -> Quadro RTX 6000 [1e84] TU104 [GeForce RTX 2070 SUPER] -> Quadro RTX 6000 [1e87] TU104 [GeForce RTX 2080 Rev. A] -> Quadro RTX 6000 [1e89] TU104 [GeForce RTX 2060] -> Quadro RTX 6000 [1e90] TU104M [GeForce RTX 2080 Mobile] -> Quadro RTX 6000 [1e91] TU104M [GeForce RTX 2070 SUPER Mobile / Max-Q] -> Quadro RTX 6000 [1e93] TU104M [GeForce RTX 2080 SUPER Mobile / Max-Q] -> Quadro RTX 6000 [1eab] TU104M -> Quadro RTX 6000 [1eae] TU104M -> Quadro RTX 6000 [1eb0] TU104GL [Quadro RTX 5000] -> Quadro RTX 6000 [1eb1] TU104GL [Quadro RTX 4000] -> Quadro RTX 6000 [1eb4] TU104GL [T4G] -> Quadro RTX 6000 [1e04] TU102 [GeForce RTX 2080 Ti] -> Quadro RTX 6000 [1e07] TU102 [GeForce RTX 2080 Ti Rev. A] -> Quadro RTX 6000 [1e2d] TU102 [GeForce RTX 2080 Ti Engineering Sample] -> Quadro RTX 6000 [1e2e] TU102 [GeForce RTX 2080 Ti 12GB Engineering Sample] -> Quadro RTX 6000 [1e30] TU102GL [Quadro RTX 6000/8000] -> Quadro RTX 6000 [1e36] TU102GL [Quadro RTX 6000] -> Quadro RTX 6000 [1e37] TU102GL [GRID RTX T10-4/T10-8/T10-16] -> Quadro RTX 6000 [1e38] TU102GL -> Quadro RTX 6000 [1e3c] TU102GL -> Quadro RTX 6000 [1e3d] TU102GL -> Quadro RTX 6000 [1e3e] TU102GL -> Quadro RTX 6000 [1e78] TU102GL [Quadro RTX 6000/8000] -> Quadro RTX 6000 [1e09] TU102 [CMP 50HX] -> Quadro RTX 6000 [1dba] GV100GL [Quadro GV100] -> Tesla V100 32GB PCIE [1e02] TU102 [TITAN RTX] -> Quadro RTX 6000 [1cfa] GP107GL [Quadro P2000] -> Tesla P40 [1cfb] GP107GL [Quadro P1000] -> Tesla P40 [1d01] GP108 [GeForce GT 1030] -> Tesla P40 [1d10] GP108M [GeForce MX150] -> Tesla P40 [1d11] GP108M [GeForce MX230] -> Tesla P40 [1d12] GP108M [GeForce MX150] -> Tesla P40 [1d13] GP108M [GeForce MX250] -> Tesla P40 [1d16] GP108M [GeForce MX330] -> Tesla P40 [1d33] GP108GLM [Quadro P500 Mobile] -> Tesla P40 [1d34] GP108GLM [Quadro P520] -> Tesla P40 [1d52] GP108BM [GeForce MX250] -> Tesla P40 [1d56] GP108BM [GeForce MX330] -> Tesla P40 [1d81] GV100 [TITAN V] -> Tesla V100 32GB PCIE [1cb6] GP107GL [Quadro P620] -> Tesla P40 [1cba] GP107GLM [Quadro P2000 Mobile] -> Tesla P40 [1cbb] GP107GLM [Quadro P1000 Mobile] -> Tesla P40 [1cbc] GP107GLM [Quadro P600 Mobile] -> Tesla P40 [1cbd] GP107GLM [Quadro P620] -> Tesla P40 [1ccc] GP107BM [GeForce GTX 1050 Ti Mobile] -> Tesla P40 [1ccd] GP107BM [GeForce GTX 1050 Mobile] -> Tesla P40 [1ca8] GP107GL -> Tesla P40 [1caa] GP107GL -> Tesla P40 [1cb1] GP107GL [Quadro P1000] -> Tesla P40 [1cb2] GP107GL [Quadro P600] -> Tesla P40 [1cb3] GP107GL [Quadro P400] -> Tesla P40 [1c70] GP106GL -> Tesla P40 [1c81] GP107 [GeForce GTX 1050] -> Tesla P40 [1c82] GP107 [GeForce GTX 1050 Ti] -> Tesla P40 [1c83] GP107 [GeForce GTX 1050 3GB] -> Tesla P40 [1c8c] GP107M [GeForce GTX 1050 Ti Mobile] -> Tesla P40 [1c8d] GP107M [GeForce GTX 1050 Mobile] -> Tesla P40 [1c8e] GP107M -> Tesla P40 [1c8f] GP107M [GeForce GTX 1050 Ti Max-Q] -> Tesla P40 [1c90] GP107M [GeForce MX150] -> Tesla P40 [1c91] GP107M [GeForce GTX 1050 3 GB Max-Q] -> Tesla P40 [1c92] GP107M [GeForce GTX 1050 Mobile] -> Tesla P40 [1c94] GP107M [GeForce MX350] -> Tesla P40 [1c96] GP107M [GeForce MX350] -> Tesla P40 [1ca7] GP107GL -> Tesla P40 [1c36] GP106 [P106M] -> Tesla P40 [1c07] GP106 [P106-100] -> Tesla P40 [1c09] GP106 [P106-090] -> Tesla P40 [1c20] GP106M [GeForce GTX 1060 Mobile] -> Tesla P40 [1c21] GP106M [GeForce GTX 1050 Ti Mobile] -> Tesla P40 [1c22] GP106M [GeForce GTX 1050 Mobile] -> Tesla P40 [1c23] GP106M [GeForce GTX 1060 Mobile Rev. 2] -> Tesla P40 [1c2d] GP106M -> Tesla P40 [1c30] GP106GL [Quadro P2000] -> Tesla P40 [1c31] GP106GL [Quadro P2200] -> Tesla P40 [1c35] GP106M [Quadro P2000 Mobile] -> Tesla P40 [1c60] GP106BM [GeForce GTX 1060 Mobile 6GB] -> Tesla P40 [1c61] GP106BM [GeForce GTX 1050 Ti Mobile] -> Tesla P40 [1c62] GP106BM [GeForce GTX 1050 Mobile] -> Tesla P40 [1bb8] GP104GLM [Quadro P3000 Mobile] -> Tesla P40 [1bb9] GP104GLM [Quadro P4200 Mobile] -> Tesla P40 [1bbb] GP104GLM [Quadro P3200 Mobile] -> Tesla P40 [1bc7] GP104 [P104-101] -> Tesla P40 [1be0] GP104BM [GeForce GTX 1080 Mobile] -> Tesla P40 [1be1] GP104BM [GeForce GTX 1070 Mobile] -> Tesla P40 [1c00] GP106 -> Tesla P40 [1c01] GP106 -> Tesla P40 [1c02] GP106 [GeForce GTX 1060 3GB] -> Tesla P40 [1c03] GP106 [GeForce GTX 1060 6GB] -> Tesla P40 [1c04] GP106 [GeForce GTX 1060 5GB] -> Tesla P40 [1c06] GP106 [GeForce GTX 1060 6GB Rev. 2] -> Tesla P40 [1b87] GP104 [P104-100] -> Tesla P40 [1ba0] GP104M [GeForce GTX 1080 Mobile] -> Tesla P40 [1ba1] GP104M [GeForce GTX 1070 Mobile] -> Tesla P40 [1ba2] GP104M [GeForce GTX 1070 Mobile] -> Tesla P40 [1ba9] GP104M -> Tesla P40 [1baa] GP104M -> Tesla P40 [1bad] GP104 [GeForce GTX 1070 Engineering Sample] -> Tesla P40 [1bb0] GP104GL [Quadro P5000] -> Tesla P40 [1bb1] GP104GL [Quadro P4000] -> Tesla P40 [1bb3] GP104GL [Tesla P4] -> Tesla P40 [1bb4] GP104GL [Tesla P6] -> Tesla P40 [1bb5] GP104GLM [Quadro P5200 Mobile] -> Tesla P40 [1bb6] GP104GLM [Quadro P5000 Mobile] -> Tesla P40 [1bb7] GP104GLM [Quadro P4000 Mobile] -> Tesla P40 [1b06] GP102 [GeForce GTX 1080 Ti] -> Tesla P40 [1b07] GP102 [P102-100] -> Tesla P40 [1b30] GP102GL [Quadro P6000] -> Tesla P40 [1b38] GP102GL [Tesla P40] -> Tesla P40 [1b70] GP102GL -> Tesla P40 [1b78] GP102GL -> Tesla P40 [1b80] GP104 [GeForce GTX 1080] -> Tesla P40 [1b81] GP104 [GeForce GTX 1070] -> Tesla P40 [1b82] GP104 [GeForce GTX 1070 Ti] -> Tesla P40 [1b83] GP104 [GeForce GTX 1060 6GB] -> Tesla P40 [1b84] GP104 [GeForce GTX 1060 3GB] -> Tesla P40 [1b39] GP102GL [Tesla P10] -> Tesla P40 [1b00] GP102 [TITAN X] -> Tesla P40 [1b01] GP102 [GeForce GTX 1080 Ti 10GB] -> Tesla P40 [1b02] GP102 [TITAN Xp] -> Tesla P40 [1b04] GP102 -> Tesla P40 [179c] GM107 [GeForce 940MX] -> Tesla M10 [17c2] GM200 [GeForce GTX TITAN X] -> Tesla M60 [17c8] GM200 [GeForce GTX 980 Ti] -> Tesla M60 [17f0] GM200GL [Quadro M6000] -> Tesla M60 [17f1] GM200GL [Quadro M6000 24GB] -> Tesla M60 [17fd] GM200GL [Tesla M40] -> Tesla M60 [1617] GM204M [GeForce GTX 980M] -> Tesla M60 [1618] GM204M [GeForce GTX 970M] -> Tesla M60 [1619] GM204M [GeForce GTX 965M] -> Tesla M60 [161a] GM204M [GeForce GTX 980 Mobile] -> Tesla M60 [1667] GM204M [GeForce GTX 965M] -> Tesla M60 [1725] GP100 -> Tesla P40 [172e] GP100 -> Tesla P40 [172f] GP100 -> Tesla P40 [174d] GM108M [GeForce MX130] -> Tesla M10 [174e] GM108M [GeForce MX110] -> Tesla M10 [1789] GM107GL [GRID M3-3020] -> Tesla M10 [1402] GM206 [GeForce GTX 950] -> Tesla M60 [1406] GM206 [GeForce GTX 960 OEM] -> Tesla M60 [1407] GM206 [GeForce GTX 750 v2] -> Tesla M60 [1427] GM206M [GeForce GTX 965M] -> Tesla M60 [1430] GM206GL [Quadro M2000] -> Tesla M60 [1431] GM206GL [Tesla M4] -> Tesla M60 [1436] GM206GLM [Quadro M2200 Mobile] -> Tesla M60 [15f0] GP100GL [Quadro GP100] -> Tesla P40 [15f1] GP100GL -> Tesla P40 [1404] GM206 [GeForce GTX 960 FAKE] -> Tesla M60 [13d8] GM204M [GeForce GTX 970M] -> Tesla M60 [13d9] GM204M [GeForce GTX 965M] -> Tesla M60 [13da] GM204M [GeForce GTX 980 Mobile] -> Tesla M60 [13e7] GM204GL [GeForce GTX 980 Engineering Sample] -> Tesla M60 [13f0] GM204GL [Quadro M5000] -> Tesla M60 [13f1] GM204GL [Quadro M4000] -> Tesla M60 [13f2] GM204GL [Tesla M60] -> Tesla M60 [13f3] GM204GL [Tesla M6] -> Tesla M60 [13f8] GM204GLM [Quadro M5000M / M5000 SE] -> Tesla M60 [13f9] GM204GLM [Quadro M4000M] -> Tesla M60 [13fa] GM204GLM [Quadro M3000M] -> Tesla M60 [13fb] GM204GLM [Quadro M5500] -> Tesla M60 [1401] GM206 [GeForce GTX 960] -> Tesla M60 [13b3] GM107GLM [Quadro K2200M] -> Tesla M10 [13b4] GM107GLM [Quadro M620 Mobile] -> Tesla M10 [13b6] GM107GLM [Quadro M1200 Mobile] -> Tesla M10 [13b9] GM107GL [NVS 810] -> Tesla M10 [13ba] GM107GL [Quadro K2200] -> Tesla M10 [13bb] GM107GL [Quadro K620] -> Tesla M10 [13bc] GM107GL [Quadro K1200] -> Tesla M10 [13bd] GM107GL [Tesla M10] -> Tesla M10 [13c0] GM204 [GeForce GTX 980] -> Tesla M60 [13c1] GM204 -> Tesla M60 [13c2] GM204 [GeForce GTX 970] -> Tesla M60 [13c3] GM204 -> Tesla M60 [13d7] GM204M [GeForce GTX 980M] -> Tesla M60 [1389] GM107GL [GRID M30] -> Tesla M10 [1390] GM107M [GeForce 845M] -> Tesla M10 [1391] GM107M [GeForce GTX 850M] -> Tesla M10 [1392] GM107M [GeForce GTX 860M] -> Tesla M10 [1393] GM107M [GeForce 840M] -> Tesla M10 [1398] GM107M [GeForce 845M] -> Tesla M10 [1399] GM107M [GeForce 945M] -> Tesla M10 [139a] GM107M [GeForce GTX 950M] -> Tesla M10 [139b] GM107M [GeForce GTX 960M] -> Tesla M10 [139c] GM107M [GeForce 940M] -> Tesla M10 [139d] GM107M [GeForce GTX 750 Ti] -> Tesla M10 [13b0] GM107GLM [Quadro M2000M] -> Tesla M10 [13b1] GM107GLM [Quadro M1000M] -> Tesla M10 [13b2] GM107GLM [Quadro M600M] -> Tesla M10 [1347] GM108M [GeForce 940M] -> Tesla M10 [1348] GM108M [GeForce 945M / 945A] -> Tesla M10 [1349] GM108M [GeForce 930M] -> Tesla M10 [134b] GM108M [GeForce 940MX] -> Tesla M10 [134d] GM108M [GeForce 940MX] -> Tesla M10 [134e] GM108M [GeForce 930MX] -> Tesla M10 [134f] GM108M [GeForce 920MX] -> Tesla M10 [137a] GM108GLM [Quadro K620M / Quadro M500M] -> Tesla M10 [137b] GM108GLM [Quadro M520 Mobile] -> Tesla M10 [137d] GM108M [GeForce 940A] -> Tesla M10 [1380] GM107 [GeForce GTX 750 Ti] -> Tesla M10 [1381] GM107 [GeForce GTX 750] -> Tesla M10 [1382] GM107 [GeForce GTX 745] -> Tesla M10 [1340] GM108M [GeForce 830M] -> Tesla M10 [1341] GM108M [GeForce 840M] -> Tesla M10 [1344] GM108M [GeForce 845M] -> Tesla M10 [1346] GM108M [GeForce 930M] -> Tesla M10 四:准备环境 4.1: 配置软件源 rm /etc/apt/sources.list rm /etc/apt/sources.list.d/* echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye main contrib non-free">>/etc/apt/sources.list echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye-updates main contrib non-free">>/etc/apt/sources.list echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye-backports main contrib non-free">>/etc/apt/sources.list echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian-security bullseye-security main contrib non-free">>/etc/apt/sources.list echo "deb https://mirrors.tuna.tsinghua.edu.cn/proxmox/debian bullseye pve-no-subscription">>/etc/apt/sources.list4.2 安装必要的软件包 apt update && apt install dkms git build-essential pve-kernel-5.15 pve-headers-5.15 dkms cargo jq uuid-runtime -y安装mdevctl wget -P /opt/ http://ftp.br.debian.org/debian/pool/main/m/mdevctl/mdevctl_0.81-1_all.deb dpkg -i /opt/mdevctl_0.81-1_all.deb4.3 配置内核 echo vfio >> /etc/modules echo vfio_iommu_type1 >> /etc/modules echo vfio_pci >> /etc/modules echo vfio_virqfd >> /etc/modules echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf update-initramfs -k all -u4.4 配置引导 编辑grub,请不要盲目改。根据自己的环境,选择设置 vi /etc/default/grub #在里面找到: GRUB_CMDLINE_LINUX_DEFAULT="quiet" #然后修改为: GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on" #如果是amd cpu请改为: GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on" #更新引导 update-grub4.5 安装驱动 重启主机,待重启之后,验证系统内核是否在5.15 root@pve:~# uname -r 5.15.30-2-pve如出现5.15则说明正确。 验证是否开启iommu 出现有如下iommu group说明成功 root@pve3:~# dmesg |grep iommu [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.11.22-7-pve root=/dev/mapper/pve-root ro quiet iommu=pt intel_iommu=on [ 0.075784] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.11.22-7-pve root=/dev/mapper/pve-root ro quiet iommu=pt intel_iommu=on [ 0.352588] iommu: Default domain type: Passthrough (set via kernel command line) [ 1.373583] pci 0000:00:00.0: Adding to iommu group 0 [ 1.373592] pci 0000:00:02.0: Adding to iommu group 1 [ 1.373605] pci 0000:00:14.0: Adding to iommu group 2 [ 1.373613] pci 0000:00:17.0: Adding to iommu group 3 [ 1.373623] pci 0000:00:1c.0: Adding to iommu group 4 [ 1.373637] pci 0000:00:1d.0: Adding to iommu group 5 [ 1.373647] pci 0000:00:1d.2: Adding to iommu group 6 [ 1.373656] pci 0000:00:1d.3: Adding to iommu group 7 [ 1.373675] pci 0000:00:1f.0: Adding to iommu group 8 [ 1.373683] pci 0000:00:1f.2: Adding to iommu group 8 [ 1.373691] pci 0000:00:1f.3: Adding to iommu group 8 [ 1.373699] pci 0000:00:1f.4: Adding to iommu group 8 [ 1.373707] pci 0000:00:1f.6: Adding to iommu group 9 [ 1.373717] pci 0000:01:00.0: Adding to iommu group 10 [ 1.373726] pci 0000:03:00.0: Adding to iommu group 11 [ 1.373735] pci 0000:05:00.0: Adding to iommu group 12 [ 1.656483] intel_iommu=on验证nouveau是否未启用 无输出,代表未启用 root@pve3:~# lsmod|grep nouveau root@pve3:~#下载驱动 #将驱动下载至/opt目录 wget https://foxi.buduanwang.vip/pan/foxi/Virtualization/vGPU/NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5-5.15.run -P /opt给驱动添加可执行权限 chmod +x /opt/NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5-5.15.run以dkms方式安装驱动 sh -c /opt/NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5-5.15.run运行命令后,会提示是否用dkms方式安装,选择yes,回车继续 image.png出现xorg告警,忽略 image.png询问是否启用32位兼容库。这里可选可不选 image.png开始安装驱动 image.png进度条走完就ok,可能会有点时间。 image.png五:配置vgpu_unlock 5.1 编译 下载vgpu_unlock-rs版本 cd /opt/ && git clone https://github.com/mbilker/vgpu_unlock-rs.git使用cargo编译 cd /opt/vgpu_unlock-rs && git checkout v2.0.1 && cargo build --release编译时间会很长,可能也需要网络,可以使用github action云编译 image.png5.2 安装vgpu_unlock cp /opt/vgpu_unlock-rs/target/release/libvgpu_unlock_rs.so /lib/nvidia/libvgpu_unlock_rs.so重启主机。 六:验证 重启之后,使用nvidia-smi 确认是否如下,显示GPU信息。 root@pve:~# nvidia-smi Wed Apr 27 23:33:10 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 P106-090 Off | 00000000:05:00.0 Off | N/A | | 31% 35C P0 28W / 75W | 11MiB / 3071MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+使用mdevctl types 验证是否出现mdev设备 root@pve:/opt/vgpu_unlock-rs# mdevctl types 0000:05:00.0 nvidia-156 Available instances: 12 Device API: vfio-pci Name: GRID P40-2B Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12 nvidia-215 Available instances: 12 Device API: vfio-pci Name: GRID P40-2B4 Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12 nvidia-241 Available instances: 24 Device API: vfio-pci Name: GRID P40-1B4 Description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24 nvidia-283 Available instances: 6 Device API: vfio-pci Name: GRID P40-4C Description: num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=4096x2160, max_instance=6 nvidia-284 Available instances: 4 Device API: vfio-pci Name: GRID P40-6C Description: num_heads=1, frl_config=60, framebuffer=6144M, max_resolution=4096x2160, max_instance=4 nvidia-285 Available instances: 3 Device API: vfio-pci Name: GRID P40-8C Description: num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=4096x2160, max_instance=3 nvidia-286 Available instances: 2 Device API: vfio-pci Name: GRID P40-12C Description: num_heads=1, frl_config=60, framebuffer=12288M, max_resolution=4096x2160, max_instance=2七:开始使用 7.1 配置vgpu参数(可选配置。使用原生vgpu,可以忽略) #创建配置文件夹 mkdir /etc/vgpu_unlock #创建vgpu配置文件 touch /etc/vgpu_unlock/profile_override.toml将vgpu配置信息写入/etc/vgpu_unlock/profile_override.toml每次启动一个vgpu设备,vgpu-mgr服务会自动读取此文件,所以修改此文件,是下次启动生效。 [profile.nvidia-18] num_displays = 1 display_width = 1920 display_height = 1080 max_pixels = 2073600 cuda_enabled = 1 frl_enabled = 0 framebuffer = 12348030976 pci_id = 0x17F011A9 pci_device_id = 0x17F0参数说明: [profile.nvidia-18]这是针对nvidia-18 vgpu型号的配置。若需要配置的vgpu型号为nvidia-46,则需要改成nvidia-16。见7.2 num_displays 最大显示器数量 display_width = 1920 display_height = 1080 max_pixels = 2073600 这3个是虚拟显示器的分辨率,max_pixels是长宽的乘积 cuda_enabled = 1是否开启cuda frl_enabled = 0 是否限制帧数,0为不限制,如限制60 144 244 framebuffer = 显存,请查看下面的补充 pci_id = SDID SVID的组合 pci_device_id = DID 设备id 7.1 framebufferframebuffer意思是vgpu管理程序设定的vgpu显存。 通过这个网址换算在线文件大小(bit,bytes,KB,MB,GB,TB)转换换算-BeJSON.com 注意:vgpu会默认占用128M,所以如果要改显存,请将结果减去128M再去换算 例如,你期望显存为2048M,所以就用2048-128=1920 进入上面的网址,进行换算。bytes是我们要的结果 image.png换算结果为2013265920 注意!非必要情况,请勿修改显卡,否则无法初始化mdev设备。 7.2 pci_id和pci_device_id 在正常情况下,将vgpu设备直通给VM,会带有vgpu的设备id,这样在系统内,会识别这个vgpu为p40-1a或者rt6000-1a之类的型号。随后安装nvidia-vgpu驱动,会将vgpu设备作为一个vgpu设备来使用,如进行授权管理。 正因为vgpu卡和普通的消费卡,核心相同,只是驱动不同,导致了功能有所不一样,所以有了vgpu_unlock项目,让消费卡也能支持vgpu。 这是宿主机层面的。 在虚拟机层面来讲。vgpu的核心,其实和显卡的核心一样,那么从理论上,将vgpu的设备id改成消费卡的id,那么也应该能够驱动。 然而,由于消费卡某些专业功能不能使用,所以建议将vgpu的设备id改成专业卡的id。 配置文件中的pci_id = 0x17F011A0和pci_device_id = 0x17F0就是修改vgpu的设备信息。这些参数,vgpu管理程序会读取这些信息,重写vgpu配置,更加的稳定和真实。 pci_device_id:是vgpu所属的设备id 这项属性应该从此处获得:https://devicehunt.com/view/type/pci/vendor/10DE/ 正因我们的目的,是改写vgpu信息,使其在虚拟机内,能被识别为专业卡,从而绕过vgpu的驱动限制,无需授权。 所以,我们应该根据你的物理卡的核心来配置这个设备id。 例如,你有一张1080来使用vgpu,从上面的网站,我们可以看到1080的核心代号为, image.png那么你应该选择核心为GP104GL的卡。如下,所以你应该选择P5000或者P4000。 image.png所以如果你要用1080,那么你的pci_device_id = 0x1BB0 pci_id: SDID的SVID的组合 pci_id 和pci_device_id用下面一张图就可以看得懂 image.pngSDID是二级制造商设备识别码,可以和DID一样 SVID是二级制造商识别码。可以和VID一样 那么如果你不知道这些信息,你可以直接写pci_id = 0x1BB010DE 7.2 vgpu类型 当我们使用mdevctl types 会出现很多信息。其中就包括了vgpu的型号 root@pve2:/opt/vgpu_unlock# mdevctl types 0000:01:00.0 nvidia-156 Available instances: 0 Device API: vfio-pci Name: GRID P40-2B Description: num_heads=4, frl_config=45, framebuffer=2048M , max_resolution=5120x2880, max_instance=12 nvidia-215 Available instances: 0 Device API: vfio-pci Name: GRID P40-2B4 Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12这些是什么意思呢? 举个例子 nvidia-257 Available instances: 4 Device API: vfio-pci Name: GRID RTX6000-2Q Description: num=heads=4, frl_config=60, framebuffer=2048MB,max_resolution=7680x4320, max_instance=4nvidia-257 -->vgpu 类型 Available instances --->可用的设备数 Name--->显示名 Description--->描述,framebuffer 显存,frl 应该是最大 fps,分辨率,最多的设备 其中 GRID RTX6000-2Q 是 mdev 的名字,RTX6000--显卡名,2--2G 显存,Q 代表 vWS 关于最后一位字母,如下 A = Virtual Applications (vApps) B = Virtual Desktops (vPC) C = AI/Machine Learning/Training (vCS or vWS) Q = Virtual Workstations (vWS)(性能最好) 每种不同类型的GPU卡,都会存在不同的vgpu类型。例如P4,有P4-1B,例如RTX6000-1B之类的 总体不变的是上面所说的规则: 按照显存分,如P4-1B,P4-1Q,都属于1g显存。 按照功能分,如P4-1B,vPC设备,P4-1Q,vDWS设备。需要不同的许可证。 在虚拟化层面,我们只关心vgpu的型号,也就是nvidia-257 在配置vgpu的时候,我们就需要选择正确的型号。 image.png如上图所示,我们需要通过mdevctl types的输出,找到我们需要的vgpu型号,通过profile_override.toml配置参数,再去web界面配置vgpu,才能完成vgpu部署。 7.4 修改虚拟机配置(必须操作) 添加下面行到虚拟机conf中 args: -uuid 00000000-0000-0000-0000-000000000100注意的是,uuid最后的值需要改成你的vmid。如果你的vmid为3333,那么你应该改成 args: -uuid 00000000-0000-0000-0000-000000003333 如果你的vmid是121,那么你应该改成 args: -uuid 00000000-0000-0000-0000-000000000121 注意,uuid的长度和格式是不能变的,根据自己的vmid,替换尾数。 7.5 创建虚拟机 使用vgpu建议使用Windows 21h1以上的系统。 7.5.1 创建虚拟机并安装系统 创建一个虚拟机,seabios和ovmf都可以,芯片组必须是Q35!除非你Q35确实不能用,则换成i440fx。此时不要直通显示设备。 参考配置如下 image.png注意! vgpu在系统中,是作为一个3d设备,所以需要一个额外的显示卡,也就是不要在控制台中,把显卡设置成无! 创建好系统之后,请在系统中,开启远程功能。如远程桌面,todesk,vnc,向日葵,parsec等。 这是因为Win10此类系统,会联网自动安装驱动,如果直通了vgpu,且系统安装了驱动,系统会呈现双显示器状态,可能导致PVE网页虚拟机控制台黑屏,或者是副屏状态,导致无法操作虚拟机。如下面 image.png如果你不慎掉入这个的坑,请关闭虚拟机,分离vgpu,开启远程功能。 7.5.2 直通vgpu设备 在面板,点击添加PCI设备,勾选所有功能和PCIE。在Mdev类型中选择vgpu设备。选哪种,请参考上文。 image.png最终虚拟机配置,像这样: image.png现在你可以开启虚拟机。如果是严格按照上面教程操作,那么应该不会有意外发生。 如果你看到有下面提示: kvm: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:05:00.0/00000000-0000-0000-0000-000000003561,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: warning: vfio 00000000-0000-0000-0000-000000003561: Could not enable error recovery for the device TASK OK不要在意,这只是个提示,最终结果都是TASK OK。 7.5.3 安装显卡驱动 我已经将比较兼容的驱动,放置在网盘中 https://foxi.buduanwang.vip/pan/foxi/Virtualization/vGPU/guestdrivers/ image.png请按照自己的情况下载驱动。 7.6 访问虚拟机 正常安装好驱动,那么不出意外,你的设备管理器,会看到模拟成专业卡的vgpu设备 image.png屏幕也会有双屏 image.png由于vgpu属于虚拟的,无法输出到物理显示器,所以应该通过远程协议访问。推荐使用parsec进行串流,但是parsec依靠NVENC,如果你的显卡没有NVENC,则不能用parsec,例如P106。 对于双屏,建议设置仅为vgpu屏幕。下面是通过vnc,进行鲁大师跑分的截图。 image.png九:排错 对于排错这部分,需要你掌握KVM知识、vgpu知识以及Linux基础。 如最开始所说,vgpu有2个服务。 可以通过2个命令查看nvidia-vgpu日志 journalctl -u nvidia-vgpud journalctl -u nvidia-vgpu-mgr如vgpu初始化部分 Apr 28 00:15:58 pve nvidia-vgpu-mgr[2534]: notice: vmiop_env_log: nvidia-vgpu-mgr daemon started #创建vgpu设备 Apr 28 00:20:17 pve nvidia-vgpu-mgr[2534]: VgpuStart { uuid: {00000000-0000-0000-0000-000000003561}, config_params: "vgpu_type_id=46", unknown_410: [75, 13, 0, 0, 0, 5, 0, 0, 1, 0, 0, 0, 0, 5, 0, 0], } #默认的vgpu配置 Apr 28 00:20:17 pve nvidia-vgpu-mgr[3528]: notice: vmiop_env_log: vmiop-env: guest_max_gpfn:0x0 ...skipping... num_displays: 4, display_width: 5120, display_height: 2880, max_pixels: 17694720, frl_config: 60, cuda_enabled: 1, ecc_supported: 1, mig_instance_size: 0, multi_vgpu_supported: 0, pci_id: 0x1b3811e8, pci_device_id: 0x1b38, framebuffer: 0x38000000, mappable_video_size: 0x400000, framebuffer_reservation: 0x8000000, encoder_capacity: 0x64, bar1_length: 0x100, blob: [71, 82, 73, 68, 32, 80, 52, 48, 45, 49, 81, 0, 96, 1, 0, 0, 8, 80, 244, 134, 2, 179, 255, 255, 0, 0, 0, 0, 96, 1, 0> license_type: "NVIDIA RTX Virtual Workstation", } #读取/etc/vgpu_unlock/profile_override.toml,并覆写vgpu配置 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Applying profile nvidia-46 overrides Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/num_displays: 4 -> 1 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/display_width: 5120 -> 1920 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/display_height: 2880 -> 1080 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/max_pixels: 17694720 -> 2073600 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/cuda_enabled: 1 -> 1 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/pci_id: 456659432 -> 472977896 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/pci_device_id: 6968 -> 7217 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: Patching nvidia-46/frl_enabled: 1 -> 0 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: op_type: 0xa0810115 failed. Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Setting mappable_cpu_host_aperture to 10M Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): gpu-pci-id : 0x500 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): vgpu_type : Quadro Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Framebuffer: 0x38000000 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Virtual Device Id: 0x1c31:0x11e8 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: ######## vGPU Manager Information: ######## Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: Driver Version: 460.73.01 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: op_type: 0x2080012f failed. #在VM中获取vgpu信息 Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Cannot query ECC status. vGPU ECC support will be disabled. Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Init frame copy engine: syncing... Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): vGPU migration disabled Apr 28 12:42:18 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: display_init inst: 0 successful Apr 28 12:42:49 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: ######## Guest NVIDIA Driver Information: ######## Apr 28 12:42:49 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: Driver Version: 453.10 Apr 28 12:42:49 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: vGPU version: 0x70001 Apr 28 12:42:49 pve nvidia-vgpu-mgr[26264]: notice: vmiop_log: (0x0): Current max guest pfn = 0x17cd58! lines 477-521/521 (END) |
CopyRight 2018-2019 实验室设备网 版权所有 |